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The invention relates to a correction device for correcting text passages in a 
recognized text information which recognized text information is recognized by a speech 
10 recognition device from a speech information and which is therefore associated to the 
speech information. 

The invention further relates to a correction method for conecting text passages 
in a recognized text information which recognized text information is recognized by a 

speech recognition device from a speech information and which is therefore associated to 
15 the speech information. 

The invention also relates to a computer program product which comprises 
correction software of word correction software which is executed by a computer. 



20 Such a correction device and such a correction method arc known e.g. from 

document US-A-6,173,259. The known correction device is realized by means of a 
conq>utBr executing a word processing software of a corrector of a transcription service. 
The corrector is an employee that manuaUy corrects text information which text 
informationis recognized from speech information automatically witii a speech recognition 

25 program. 

The speech information in tiiis case is a dictation generated by an autiior which 
dictation is transmitted to a server via a computer network. The server distributes received 
speech information of dictations to various computers of which each execute speech 
recognition software constitiiting a speech recognition device in tiiis case. 

known speech recognition device recognizes text information from the 
speech information of the dictation by the author sent to it, with link information also being 
estabUshed. The link information marks for each word of the recognized text information, a 
part of the speech mformation for which the word was recognized by tiie speech 
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recognition device. The speech information of the dictation and the recognized text 

inf onnation and the linic information are transferred from the speech recognition device to 

the computer of the corrector for a correction process. 

The known correction device contains synchronous playback means, by which 
5 means a synchronous playback mode can be performed. When the synchronous playback 
mode is active in the correction device, the speech information of the dictation is played 
back while, in synchronism with each acousticaUy played-back word of the speech 
mformation, fte word recognized from the played-back word by the speech recognition 
system is marked with an audio cursor. The audio cursor tiius marks the position of fhe 
10 word that has just been acoustically played-back in the recognized text information. 

In the event of an unsuitable or incorrect recognized text passage picked up by 
the corrector, the unsuitable or incorrect recognized text passage is replaced with a 
different -correct respectively suitable -text passage. Such a correction work is extiemely 
time-consuming, thereby considerably increasing costs of Ihe transcription. On flie other 
15 hand, if the quaUty of fhe recognition and correction of Ihe recognized text should be at a 
maximum, the corrector has to Hsten to tiie whole sound respectively watch the whole 
recognized text One of the aims, therefore, is to make the correction work foUo wing a 
recognition as rapid and efficient as possible with an maximum quaUly of tiie recognized 
respectively corrected text 



20 



It is an object of the invention to provide a correction device in accordance 
with the type mentioned in the first paragr^h. a correction mettiod in accordance witii the 
type mentioned in the second paragraph and a computer program product in accordance 
25 with flie ^ mentioned in tiie tiiird paragraph witii which the above-mentioned 
disadvantages and shortcomings are avoided. 

In order to achieve flie above-mentioned object, in such a correction device 
features in accordance witix the invention are provided so fliat the correction device can be 
characterized in the way set out in the followiag. 

A correction device for correcting text passages in a recognized text 
information which recognized text information is recognized by a speech lecognitic 
device from a speech information and which is tiierefore associated to flie speech 
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information, the correction device comprising: reception means for receiving the speech 
information and the associated recognized text information and a link information, which 
link information at each text passage of the associated recognized text information marks 
the part of the speech information at which the text passage was recognized by the speech 
recognition device, and a confidence level information, which confidence level information 
at each text passage of the recognized text information represents a correctness of the 

recognition of said text passage and comprising synchronous playback means for 
performing a synchronous playback mode, in which synchronous playback mode during an 
acoustic playback of the speech information the text passage of the recognized text 
information associated to the speech information just played back and marked by the link 
information is marked synchronously and comprising indication means for indicating the 
confidence level information of a text passage of flie text information during the 
synchronous playback. 

In order to achieve the above-mentioned object, featirres in accordance wifii the 
invention are envisaged in such a correction method so fliat tiie correction mefliod can be 
characterized in the way set out in the following, 

A correction mefliod for correcting text passages in a recognized text 

information which recognized text information is recognized by a speech recognition 
device fiom a speech information and which is therefore associated to the speech 
information, in which tire foUowing steps are performed: receiving flie speech mformation 
and die associated recognized text information aiid a hnk information, which link 
information at each text passage of tire associated recognized text information marks tiie 
part of flie speech information at which flie text passage was recognized by flie speech 
recognition device, and a confidence level mformation, which confidence level information 
at each text passage of tiie recognized text information represents a correctiiess of flie 
recognition of said text passage; performing a synchronous playback mode, in which 
synchronous playback mode during acoustic playback of tiie speech information flie text 
passage of flie recognized text information associated to flie speech information just played 
back and marked by flie link information is marked synchronously; indicating tiie 
confidence level information of a text passage of flie text information during flie 
synchronous playback. 

In order to achieve tiie above-mentioned object, such a computer program 
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product includes features in accordance with the invention so that ihe computer program 
product can be characterized in the way set out in the following. 

A computer program product for a computer, comprising software code 
portions for perfonning the steps of the above-mentioned correction method when said 
product is run on the computer. 

By virtue of the characteristic features of the invention, it is achieved in a 
relatively simple way that for example a corrector of a transcription system using a 
correction device according to the invention is able to make a conection work foUowing a 
recognition relatively rapid and efficient thereby ensuring a best quaHty of the recognized 
or corrected text information. In particular by means of indicating the confidence level 
information of a text passage of the recognized text information during the synchronous 
playback rather then as an at once and permanent indication of the confidence value of aU 
text passages of tlie text information has the advantage that the corrector can easily 
recognize a wrong or incorrect text passage without being diverted or concentrated on the 
pramanent indications. 

In the embodiments according to the invention, it has been proved to be 
advantageous when measures as claimed in claim 2 and claim 7 are provided. The corrector 
does not only focus on individual passages, but on the whole document, thereby 
guaranteeing higher quality and accuracy. 

In an embodiment according to the invention the indicating of the confidence 
level information of a text passage of the text information may be performed acousticaUy. 
In the embodiments according to the invention, it has proved to be very advantageous when 
measures as claimed in claim 3 and claim 8 are provided. The visual feedback serves as a 
signal, a means of increasing the attention on a particular text passage to the corrector. 

It has further proved to be very advantageous in the embodiments according to 
the invention when measures as claimed in claim 4 and claim 9 are provided. By changing 
the speed of the playback for a particular section of the dictation automaticaUy in 
dependence of the confidence level information, the attention of the corrector is increased 
resulting in an increased accuracy of the corrected text information. For example, an 
automatic slow down of the playback speed may be performed for a text passage with a 
lower confidence level. 

In the embodiments according to the invention, it has further been proved to be 
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advantageous when measures as claimed in claim 5 and claim 10 are provided. By this the 
accuracy of the corrected text may further be improved. 

The invention will be better understood according to the foUowing descrq)tion 
explaining the physical basis of the invention based on the enclosed drawmg showing a 
prefeired embodiment of the latter as a non-limitative example of implementation. 

Figure 1 shows, in accordance with this mvention, a correction system in form 
of a block diagram. 



Figure 1 shows a correction system 1 which comprises a computer la. By 
means of the computer la speech recognition software and text processing software is 
executed. The correction system 1 has a speech signal input 2 and input means 3 and a foot 

15 switch 4 and a loudspeaker 5 and a screen 6 connected to it. In this case the input means 3 
are realized by a keyboard and a mouse. 

A speech signal SS is received at the speech signal input 2 and transferred to a 
speech engine 7. The speech signal SS in this case is a dictation received from a server via 
a network (not shown). A detailed description of receiving such a speech signal SS can be 

20 derived from document US 6,173.259 Bl, which document is herewith incorporated by 
reference. 

The speech engine 7 contains an A/D converter 8. By means of the A/D 
converter 8 the speech signal SS is digitized, whereupon the A/D converter 8 transfers 
digital speech data DS to a speech recognizer 9. 

25 The speech recognizer 9 is designed to recognize text information assigned to 

the received digital speech data DS. In the following said text information is referred to as . 
recognized text information RTT. The speech recognizer 9 is further designed to estabhsh 
link information LI which for each text passage of the recognized text information RTI 
marks the part of the digital speech data DS at which the text passage has been recognized 

30 by the speech recognizer 9. Such a speech recognizer 9 is known, for example, from the 
document US-A-5,031.113, the disclosure of which is deemed to be included in the 
disclosure of this document by this reference. 
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Those skUled in .he art wUl appreciate that the infonnation provided by tte 

speechrecog™zer9foreach,ecog,nzed.extpas«^ca,,hes.a.isticanyar^y^ fi. 

partcular. the speechrecogoizer9canp,ovide.soo,eindicadve of the co.mdeoee level 
-'^'^'-y'^eapeechrecogoizerS.oaparticular.eoognitionofaparticuteword.ll^^ 
3 --^--alyzedbyacoafide^elevelsc^erlOofthe^recogni^S.^ 
foaowmg said scores are referred to as confidence level information CU. 

lie speech engine 7 also comprises memory means II. By means of said 
««mory means 11 the digital speech data DS Wened by the speech recognizer 9 «e 

^'-^»'o«S«l*«--cogni.edt„tinfo,mattonRlIandthelinkinf„rxna«„nUandfl^ 
10 «»aJence level information CU of the speech signal SS. 

The correction system 1 also comprises a cotrection device 12 for recognizing 
andco.rec.ing™ongor„nsnitahlerecognized.ex. or words. Tie correcdondevi^ ,2 is 
redrzed by a.e computer la processing the text editing software. wMch editing 

te. mf^matron. Cotrecdon device 12 is ftotherrefetred to as conecdon software 12 and 
contams edMmg means 13 and synchronons playback means 14. 

^ >^,.rT^'^"'" '"'^^ '° — at a .ex. passage 

to has to be Changed or an incorxect text passage of the tecognized text informadonV 
™i to edit the recognized text passage in accordance with edidng m&rmadon EI entered 
X> o'^eco.recdonsys.eml.wMchnserisac^rect^.rinddscase.lheediting 

^onEIin this caseisentetedbya^nserwithkeys Of fl^teyboan, of the edidng 
means 3, in a generaUy known manner. 

ll«^=>»>nouspIaybaclcmeansl4areallowingasynchr.nou3playback 
mode of the correcdon system 1. in wMch synchronons playback mode d,e text passage of 

5 '^-ogmzedtex.informationRTImartedb.U.elinkinfomrattonUconcemingthe 
^ informadon jns. played back is synchronously matked dmirg an aconsdc playback 

Of aespeedrmformadon of fte dictation. Suchasynchrononsplaybackm^Jeisknown 
fer example, from the document WO 01/46853 Al. ti» di^iosme of which is deemed to be 
mcluded in the disclosure of flds document throng Uds reference. 

^'■■'^»'^'*«'«<"»Playb«*modeisacdve.audiodataoffl»dic.ati„n 

whrch is stored in the mem«y means 1 1 as digital speech data DS can be read out by tire 
synchronons playback means 14 and condnuonaly ttansferred to a D/A converter 15 ll,e 
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D/A converter 15 then converts the digital speech data DS into speech signal SS. Said 
speech signal SS is downstream transfeired to the loudspeaker 5.for acoustic playback of 
the dictation. 

To activate the synchronous playback mode, the user of the collection system 1 
can place his foot on one of two switches provided by the foot switch4, whereupon control 
information CI is transferred to the synchronous playback means 14. Then flie synchronous 
playback means 14 in addition to the digital speech data SD of the dictation also read out 
the link mformation U stored for said dictation in the memory means 11. 

In synchronous playback mode, the synchronous playback means 14 are further 
designed to generate and transfer audio cursor information ACI to the editing means 13 
Immediately after Ihe activation of the synchronous playback mode the editmg means 13 
are designed to read out the recognized text information RTI from the memory means 1 1 
and to temporarily store it as text infomiation TI to be displayed Said temporarily stored 
text information TI to be displayed corresponds for the time being to the recognized text 

15 "rfonn^on RTI and may be corrected by the corrector by corrections to incorrect text ' 
passages in order to ultimately achieve etror-fiee text information. 

The text information TI temporarily stored in the editing means 13 is 
liansferxed from the editing means 13 to hnage processing means 17. The image processing 
means 17 process the text information TI to be displayed and transfer presentable display 
20 mformation DI to the screen 6. Said display mformation DI contains the text mfonnation 
TI to be displayed. 

As aheady mentioned, the display process is windows-based. For the user the 
foUowmg is recognizable during the synchronous playback. Primary a window on the 
screen or display is fiUed with the recognized text The recognized word corresponding to a 
25 speech segment respectively the audio data which is played back as already mentioned 
above is indicated by high-Ughting the word on the screen. As such, the high-Ughting 
follows the play back of the speech. 

In the embodiment shown in figure 1 the editing means 13 contain mdication 
means 16. The indication means 16 are constructed for indicating the confidence level 
30 information CLI of a text passage of the text infonnation TI to be displayed during the 
synchronous playback which confidence level information CLI is received from the 
memory means 1 1. In this case the text passage is a single word. It may be observed that 
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the confidence level of so caUed bigrams or trigrams or phrases of the recognized text 
information may be indicated. 

It may further be observed that the indication means 16 may be a separate block 
within the correction device 12 being connected to the editing means 13 and/or the 
5 synchronous playback means 14 and receiving confidence level information CLI and audio 
cursor mformation ACI and recognized text information RTI and outputting text 
information H with a confidence value indication. 

In the present embodiment, the indication is peiforaied by applying a color 
attribute to each word which is currentiy "active" in Ihe synchronous playback which 
0 means the word which is played back. A threshold level respectively a confidence limit is 
settable before starting tiie synchronous playback mode. The confidence limit may Ue. for 
example, at 80% of a maximum confidence value range of the confidence level infoimation 
CU stored in the memory means 1 1 . Accordingly, for each "active" word an inquiry takes 
place as to whether the confidence level information CLI of said word is smaUer, equal to 
5 or greater than flie tiu»shold level. If the threshold level is undershot or equaled, the 

"active" word is marked respectively a color attribute different to a default color attribute is 
assigned resulting in a dififerent color high-Hghting on screen 6. 

Being notified about flie confidence level of a word of the text information TI 
just during the synchronous playback rather then as a permanent indication of tiie 
confidence value information CU of all words in tiie displayed text information TI has flie 
advantage that the corrector can easily recognize a wrong or incorrect word wifliout being 
diverted or concenti:ated on tiie permanent indications. 

It may be observed tiiat otiier visual indications may be used to indicate a 
confidence level information CU of a word when synchronous playback takes place, for 
example, tiie word may be show bold or underlined. Furfhermoie. instead of marking the 
word, a separate indication at tiie text-window may be provided in tiie form of a flash-Ught. 
which flash-Kght indicates tiie confidence level information CLI respectively flie 
confidence value of tiie "active" word. By tiiis, a corrector just needs to concenti^te at flie 
flash-light in a fixed position rattier tiian - in synchronous playback mode - following tiie 
"active" words in flie text displayed and/or highlighted on screen 6. 

Since a playback speed in synchronous playback mode may be comparatively 
fast, file playback speed may be changed automatically in dependence of flie confidence 
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level. For example, the playback speed for a word with 80% of a maximum confidence 
value may be reduced by half of the normal playback speed of a word with the mavimm^ 
confidence value, thus correctly recognized. 

It may further be observed that the indicating of the confidence level 
information dJ respectively the confidence value in accordance with the inv^tion may be 
paf armed acoustically. In this case a sound signal may be generated and emitted via a 
loudspeaker. A different pitch or a different loudness or volume of the generated sound 
signal may be used to indicate a different confidence value. 

It may be observed further that the indicating of the confidence level 
information CLI respectively the confidence value in accordance with the invention may be 
performed by means of vibrations. In this case additionally vibration means are provided 
which vibration means can be brought into a contact with the user respectively corrector 
and in which the corrector may feel or sense vibrations in dependence of the confidence 
value of a word played back in the synchronous playback mode. 

As already mentioned the correction system 1 is implemented on a 
conventional computer, such as a PC or workstation. It should be mentioned that portable 
equipment, such as personal digital assistants (PDAs), laptops or mobile phones may be 
equipped with a correction system and/or speech recognition. The functionaUty described 
by the invention is typically executed using the processor of flie device. The processor, 
such as PC-type processor, mioro-controller or DSP-like processor, can be loaded with a 
program to perform the steps according to the mvention. Such a computer program product 
is usuaUy loaded from a background storage, such as a hard disk or ROM. The computer 
program product can mitially be stored in the bacl^ound storage after having been 
distributed on a storage medium, like a CD-ROM, or via a network, like the pubhc internet. 
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1 . A correction device ( 12) for correcting text passages in a recognized text 
information (RH) which recognized text information (RH) is recognized by a speech 
recognition device from a speech information and which is therefore associated to the 
5 speech information, the correction device (12) comprising: 

rec^tion means (13, 14) for receiving the speech information and flie associated 
recognized text information (RTl) and a link information, which Unk information at each 
text passage of flie associated recognized text information (RH) marks the part of the 
speech information at which the text passage was recognized by the speech recognition 
device, and a confidence level information (CLI), which confidence level information 
(CLI) at each text passage of the recognized text information (RTI) represents a conectness 
of the recognition of said text passage and conqnising 

synchronous playback means (14) for performing a synchronous playback mode, in which 
synchronous playback mode during an acoustic playback of the speech information tiie text 
passage of tiie recognized text information (RH) associated to the speech information jiist 
played back and marked by the link information is marked synchronously and comprising 
indication means (16) for indicating the confidence level information (CLI) of a text 
passage of the text information during the synchronous playback. 

2. A correction device (12) as claimed in claim 1 , in which the indication 
means (16) are constricted for indicating flie confidence level infonnation (CLI) of the text 
passage just played back. 

3. A correction device (12) as claimed in one of the claims 1 to 2, in which 
the indication means (16) are constiiicted for mdicating the confidence level by means of a 
visual indication. 

4. A correction device (12) as claimed in one of the claims 1 to 3, in which 
Ihe playback means (14) are constructed to change a playback speed during die acoustic 
playback in dependence of the confidence level information (CLI). 

5. A correction device (12) as claimed in one of the claims 1 to 4, in which 
the indication means (16) are constinicted for indicating flie confidence level infonnation 
(CLT) of phrases. 

6. A correction method for correcting text passages in a recognized text 
infonnation (RH) which recognized text information (RTI) is recognized by a speech 
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recognition device from a speech information and which is therefore associated to the 
speech information, in which the foUowing steps are performed: 

receiving the speech information and the associated recognized text infoimation (RTJ) and 
a hnk information, which link information at each text passage of the associated x^ognized 
text information (RH) marks the part of the speech information at which the text passage 
was recognized by the speech recognition device, and a confidence level mfoimation (CUE) 
which confidence level mf otmation (OJ) at each text passage of the recognized text 
infoimation (RTI) represents a correctness of the recognition of said text passage; 
performmg a synchronous playback mode, in which synchronous playback mode during 
acoustic playback of the speech infoimation the text passage of the recognized text 
infonnation (RTI) associated to the speech information just played back and marked by the 
link information is marked synchronously; 

indicating the confidence level infonnation (CU) of a text passage of the text information 
during the synchronous playback. 

7. A correction method as claimed in clahn 6, in which an indicating of the 
confidence level infoimation (CU) of the text passage just played back is pexfoixned. 

8. A correction method as claimed in one of the claims 6 to 7, in which the 
indicating of the confidence level mfoimation (CXI) is peif oimed by means of a visual 
indication. 

20 - 9. A correction method as claimed in one of tiie claims 6 to 8. in which a 

change of a playback speed is performed during the acoustic playback in dependence of tiie 
confidence level information (CU). 

10. A correction method as claimed in one of the claims 6 to 9. in which at ihe 
indicating of the confidence level information (CU) the indication of the confidence level 

25 infoimation (CU) of phrases is perfoimed. 

11. A computer program product for a computer (la), comprising software 
code portions for performing tiie steps of at least one of clahns 6 to 10 when said product is 
run on the computer (la). 

12. A computer program product according to claim 11, wherein said 

30 computer programproduct comprises a computer-readable medium on which said software 
code portions are stored. 



15 



PHAT030016 



BEST AVAILABLE COPY 

1/1 



TC 



TEXT RECOGNIZED HA SPEECH 
RECOGNITION SOFTWARE CONTAINS 
PERHAPS WRONG WORDS 




Fig.1 



BEST AVAILABLE COPY 



r/IB2004/050360 



